DOCUMENT RESUME 



ED 274 315 



IR 012 292 



AUTHOR 
TITLE 

INSTITUTION 
SPONS AGENCY 

REPORT NO 
PUB DATE 
NOTE 

AVAILABLE FROM 
PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Yao, S. Bing; Hevner, Alan R. 

A Guide to Performance Evaluation o£ Database 

Systems. 

Software Systems Technology, College Park, MD. 
National Bureau of Standards (DOC), Washington, D.C. 
Inst, for Computer Sciences and Technology. 
NBS/SP-500-118 
Dec 84 

58p. ; Part of the Series, Reports on Computer Science 
and Technology. 

Superintendent of Documents, U.S. Government Printing 
Office, Washington, DC 20402. 
Guides - Non-Classroom Use (055) — Reports - 
Research/Technical ( 143 ) 

MF01/PC03 Plus Postage. 

^Database Management Systems; ^Databases; ^Evaluation 
Criteria; ^Evaluation Methods; Indexing; Performance; 
Reaction Time; Research Design 
^Benchmarking 



ABSTRACT 

Benchmarking is one of several alternate methods o£ 
performance evaluation, which is a key aspect in the selection o£ 
database systems. The purpose of this report is to provide a 
performance evaluation methodology, or benchmarking framework, to 
assist in the design and implementation of a wide variety of 
benchmark experiments. The methodology, which identifies criteria to 
be utilized in the design, execution, and analysis of a database 
system benchmark, has been applied to three different database 
systems* representative of current minicomputer, microcomputer, and 
database machine architectures. This generalized methodology can 
apply to most database system designs. In addition to presenting a 
wide variety of possible considerations in the design and 
implementation of the benchmark, this methodology can be applied to 
the evaluation of either a single system with several configurations, 
or to the comparison of several systems. A summary of the report 
identifies the three principal phases of a database system 
benchmark — benchmark design, execution, and analysis — and notes that 
no generalized methodology can provide a complete list of 
considerations for the design of an actual experiment. Seventy 
references are listed. (DJR) 
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m he National Bureau of Standards* was established by an act of Congress on March 3, 1901. The 
M Bureau's overall goal is to strengthen and advance the nation's science and technology and facilitate 
their effective application for public benefit. To this end, the Bureau conducts research and provides: (1) a 
basis for the nation's physical measurement system, (2) scientific and technological services for industry and 
government, (3) a technical basis for equity in trade, and (4) technical services to promote public safety. 
The Bureau's technical work is performed by the National Measurement Laboratory, the National 
Engineering Laboratory, the Institute for Computer Sciences and Technology, and the Center for Materials 
Science. 



The National Measurement Laboratory 

Provides the national system of physical and chemical measurement; 
coordinates the system with measurement systems of other nations and 
furnishes essential services leading to accurate and uniform physical and 
chemical measurement throughout the Nation's scientific community, in- 
dustry, and commerce; provides advisory and research services to other 
Government agencies; conducts physical and chemical research; develops, 
produces, and distributes Standard Reference •Materials; and provides 
calibration services. The Laboratory consists of the following centers: 



• Basic Standards'' 

• Radiation Research 

• Chemical Physics 

• Analytical Chemistry 



77?^^ National Engineering Laboratory 



Provides technolo£,y and technical services to the public and private sectors to 
address national needs and to solve national problems; conducts research in 
engineering and applied science in support of these efforts; builds and main- 
tains competence in the necessary disciplines required to carry out this 
research and technical service; develops engineering data and measurement 
capabilities; provides engineering measurement traceability services; develops 
test methods and proposes engineering standards and code changes; develops 
and proposes new engineering practices; and develops and improves 
mechanisms to transfer results of its research to the ultimate user. The 
Laboratory consists of the following centers: 

The Institute for Computer Sciences and Technology 



Applied Mathematics 
Electronics and Electrical 
Engineering^ 

Manufacturing Engineering 
Building Technology 
Fire Research 
Chemical Engineering^ 



Conducts research and provides scientific and technical services to aid 
Federal agencies in the selection, acquisition, application, and use of com- 
puter technology to improve effectiveness and economy in Government 
operations in accordance with Public Law 89-306 (40 U.S.C. 759), relevant 
Bcecutive Orders, and other directives; carries out this mission by managing 
the Federal Information Processing Standards Program, developing Federal 
ADP standards guidelines, and managing Federal participation in ADP 
voluntary standardization activities; provides scientific and technological ad- 
visory services and assistance to Federal agencies; and provides the technical 
foundation for computer-related policies of the Federal Government. The In- 
stitute consists of the following centers: 



• Programming Science and 
Technology 

• Computer Systems 
Engineering 



The Center for Materials Science 

Conducts research and provides measurements, data, standards, reference 
materials, quantitative understanding and other technical information funda- 
mental to the processing, structure, properties and performance of materials; 
addresses the scientific basis for new advanced materials technologies; plans 
research around cross-country scientific themes such as nondestructive 
evaluation and phase diagram development; oversees Bureau-wide technical 
programs in nuclear reactor radiation research and nondestructive evalua- 
tion; and broadly disseminates generic technical information resulting from 
its programs. The Center consists of the following Divisions: 



Inorganic Materials 

Fracture and Deformation 

Polymers 

Metallurgy 

Reactor Radiation 



'Headquarters and Labo'aiories at Gaiihersburg, MD, unless otherwise noted; mailing address 
Gaithersburg. MD 20899. 

^Some divisions within the center are located at Boulder, CO 80303. 
^Located at Boulder, CO, with some elements at Gaithersburg, MD. 
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A GUIDE TO PERFORMANCE 
EVALUATION OF DATABASE SYSTEMS 



Daniel R. Benigni, Editor 



This guide presents a generalized performance 
analysis methodology for the benchmarking of data- 
base systems. The methodology identifies criteria 
to be utilized in the design, execution, and 
analysis of a database system benchmark. This 
generalized methodology can apply to most database 
system designs. In addition, presenting a wide 
variety of possible considerations in the design 
and implementation of the benchmark, this metho- 
dology can be applied to the evaluation of either 
a single system with several configurations, or to 
the comparison of several systems. 



Key words: Benchmark execution; benchmark metho- 
dology; benchmark workload; database systems; 
DBMS; indexing; performance evaluation; query com- 
plexity; response time. 



FOREWORD 



This report is one of a continuing series of NBS publi- 
cations in the area of data management technology. It con- 
centrates on performance evaluation, which is a key aspect 
in the selection of database systems. 

Benchmarking is one of several alternate methods of 
performance evaluation. It can be an expensive undertaking. 
However, this expense may be necessary for some applica- 
tions, e.g., those involving large databases or where 
response time requirements are critical. 

The purpose of this report is to provide a performance 
evaluation methodology, or benchmarking framework, to assist 
in the design and implementation of a wide variety of bench- 
mark experiments. The methodology has been applied to three 
different database systems representative of current mini- 
computer, microcomputer, and database machine architectures. 
Detailed results can be found in [YAO 84] . 

Other NBS publications addressing various aspects of 
data management system selection include: FIPS PUB 77 [NBS 
801 , NBS Special Publication 500-108 [GALL 84] , and a forth- 
coming NBS publication on "Choosing a Data Management Ap- 
proach." The advantages and disadvantages of benchmarking 
and other techniques for evaluating computer systems are 
discussed in NBS SP-500-113 [LETM 84] . 

References to commercial products as necessary to sur- 
vey results of previous work on performance evaluation are 
contained in this guideline. In no case does this imply 
recommendation or endorsement by NBS. 
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1. INTRODUCTION 



The rising popularity of database systems for the 
management of data has resulted in an increasing number of 
new systems entering the marketplace. As the number of 
available systems grows the difficulty in choosing th^ sys- 
tem which will best meet the requirements of a particular 
application environment also increases. Database systems 
have been implemented on many different computer architec- 
tures; mainframes, minicomputers, microcomputers, and as 
stand-alone database machines. The selection of a database 
system from among these varied alternatives requires a 
structured and comprehensive evaluation approach. 

A complete evaluation methodology for database systems 
must integrate both feature analysis and performance 
analysis phases. The range of features and capabilities 
that a database system may support is very large. Feature 
lists for database systems have appeared in a number of ar- 
ticles [CODA 76, AUER 81, WEIS 81a, WEIS 81b, BARL 81, DATE 
81, SU 81a, SU 81b, BROD 82] . 

A feature analysis perforins two functions; it first 
serves as a winnowing process to eliminate those database 
systems which are completely unsuitable for answering the 
needs of a particular application; and second, it provides a 
ranking of the surviving candidate systems. Feature 
analysis is a widely used method of database system evalua- 
tion. It has a number of significant advantages over other 
methods of system evaluation. 



1. A database system implementation is not required. 
Analysis is based upon documentation. Little or no 
system costs are involved in performing a feature 
analysis. This is critical for users with no system 

. access. 

2. Feature analysis provides a structured first cut for 
narrowing the range of potential database systems. A 
large number of systems can be evaluated effectively 
at one time. The result of a feature analysis should 
be a small number of candidate systems. Performance 
analysis, which is much more costly, need be per- 
formed on only this small number of systems. 

3. The list of features evaluated can be customized to 
an organizat long's application environment and 
presented at the level of detail desired by the 
designer. Among these features are qualitative 
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aspects of a database system that cannot be quanti- 
fied in terms of system performance. Examples in- 
clude vendor support, documentation quality, securi- 
ty, "user friendliness**, and reliability. Benchmark 
analysis cannot directly test the performance of 
these features. Thus, feature analysis remains the 
best method for their analysis. 

In spite of these advantages, feature analysis should 
not be used in isolation to evaluate and select database 
systems. There are several reasons: 

1. Feature analysis is a subjective exercise. Feature 
importance coefficients and the system support rat- 
ings needed in feature analysis are values which must 
be provided by a knowledgeable design expert. Howev- 
er, no two experts will come up with same values 
given the same application environment. At best, the 
feature analysis scores among different database sys- 
tems should be viewed as rough indicators of the sys- 
tems'* applicability. 

2. To obtain consistent scoring among different database 
systems the evaluator must be equally well acquainted 
with all. systems. This places a great burden upon 
one person to acquire this knowledge. If instead, 
different persons grade different systems, then the 
scoring consistency problems increase because grading 
standards must be set and closely controlled. 

3. The greatest disadvantage of feature analysis is that 
no true system performance is measured. Feature 
analysis is a paper exercise that cannot truly evalu- 
ate how a system will perform in an organization^'s 
application environment. 

The limitations of feature analysis introduce the need 
for a more rigorous evaluation method that can provide ob- 
jective, quantifiable differences among the candidate data- 
base systems. Performance analysis provides this type of 
evaluation. 

The goals of performance analysis techniques are to 
model a database system'^s behavior and gather performance 
data. This is done to identify the system^'s strengths and 
weaknesses fLUCA 71, FERR 81]. Performance analysis has 
been utilized on database systems for two purposes. The 
first is to evaluate a single system to determine the best 
configuration, or running environment, for that system. For 
example, new system algorithms (e.g., file management [STON 



f^^l'.'i'^f'y^^P^^^^^a^ion [HEVN 79]) can be tested before ac- 
^l^i^ implementing them m the system, m this way systems 
5h2 '^^..^""^"fo'^ their most efficient operating condition. 

application of performance evaluation on database 
systems has been to study two or more database systems, thus 
providing a comparison of the systems^ performance. 

Section 2 presents an overview of past research on the 
performance evaluation of database systems. The purpose is 
^f..hn?2n ^^^^ benchmark analysis is the most comprehensive 
fn« ™^?^? analyzing a single database system ot compar- 

i™h^» ?u2^f^® systems. An overview of the complete 

benchmark methodology is given in Section 3. The remainder 
™? discusses in detail the design, execution, and 

analysis steps required in the benchmark methodology. 



2. PERFORMANCE EVALUATION TECHNIQUES 



The major methods of performance evaluation are Analyt- 
ic Modelling, Simulation Modelling, and Benchmarking. A 
brief description of each method and a survey of previous 
work using the method for database system analysis is 
presented. The advantages and disadvantages of using each 
method are discussed. 



2.1 Analytic Modelling 

Analytic modelling represents a system by defining 
equations that relate performance quantities to known system 
parameters. The use of these equations allows a fast and 
accurate means to evaluate system performance. Two models 
that have been used predominantly to evaluate database sys- 
tem performance are queueing models and cost models. 



2.1.1 Queueing Models. 

The dynamic behavior of a database system can be viewed 
as a stochastic process and can be represented analytically 
as a queueing model [CCMP 78] . A database system is 
modelled as a multiple resource system with jobs moving 
through the system demanding services from the resource sta- 
tions. Queueing analysis provides the performance measures 
of system throughput, resource utilization, and job response 
time [MIYA 75]. The database system workload can be charac- 
terized by statistical parameters obtained from the datab^ 
requests [LEWI 76] . Because the database systems are usu^ 
ly quite complex, a queueing model normally can represe^ 
only a portion of its dynamic behavior. This is demonstrat- 
ed clearly by the attempt to analytically model the schedul- 
ing of an IMS (IBM^'s Information Management System) database 
system in [GAVE 76, LAVE 76]. However, certain aspects of 
database system processing are conducive to queueing model 
analysis. For example, queueing models have been used to 
analyze concurrency control algorithms [SMIT 80, POTI 80] 
and data allocation in distributed database systems [COFF 
81] . 



2.1.2 Cost Models. 

Cost analysis has been an effective way of obtaining 
performance estimates for physical database structures. The 
performance measures most easily obtained by cost analysis 
are storage costo and average response time for queries. 



Cost equations for inverted file systems have been developed 
in [CARD 73] . Generalized cost models have been pioneered 
in [YAO 74, 75, 77a, 77b] and further extended in [TEOR 76] 
and [BATO 82]. The cost model approach has been used to 
analyze tine ^Performance of query processing in various rela- 
tional database systems [YAO 78, 79] . These models and 'jost 
functions have been useful for performing critical path 
analysis for database system applications. Hawthorne and 
Dewitt [HAWT 8 2] have developed cost models to evaluate 
query processing among different proposed database machine 
designs. A performance analysis of hierarchical processing 
in an IMS database system has been performed using cost 
models [BANE 80] . 

Analytic modelling has proven useful in many areas of 
database modelling. However, analytic models have some ma- 
jor disadvantages. Queueing models are inadequate to model 
the complete range of functionality found in a database sys- 
tem. Cost modelling fails to account for the dynamic 
behavior of the database system. For these reasons, analyt- 
ic modelling has failed to receive wide acceptance as a tool 
for modelling database systems. 



2.2 Simulation Modelling 

Most real world systems are too complex to allow real- 
istic models to be evaluated analytically. Simulation is 
the process of approximating the behavior of a system over a 
period of time. The simulation model is used to gather data 
as an estimate of the true performance characteristics of 
the system. Simulation modelling has been applied to data- 
base systems as illustrated in the following survey of 
representative work performed in this area. 

A database system simulator, PHASE II, has been 
developed for the analysis of hierarchical data structures 
[OWEN 71] . PHASE II is an effective tool for evaluating 
hardware configurations, data structures, and search stra- 
tegies. Another simulation model has been used to model the 
UNIVAC EMS-1100 database system [GRIP 75] . Hulten and 
Soderland designed the ART/DB simulation tool to investigate 
a multiprogrammed database system containing multiple CPUs 
[HULT 77] . This simulation tool is written in SIMULA and 
has an interactive interface. 

A simulation model containing four modelling components 
(application program, database system, operating system, and 
hardware environment) was reported in [NAKA 75]. This simu- 
lation tool has two major sections: a definition section and 
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a procedure section. The definition section describes the 
system environment being simulated while the procedure sec-- 
tion represents the software system using an instruction set 
prepared by the simulator. This type of programming inter-- 
face allows a user to modify the system parameters while 
running a series of simulation programs. In another paper 
an IDS simulator is described [APPL 73]. A DBMS evaluation 
methodology through the integrated use of a limited proto-- 
type implementation for the DBMS design, a flexible measure-- 
ment facility, and a predictive model based on the DBMS pro- 
totype was developed in [DSUT 79] . 

Similar to analytic models, simulation models are most 
often used to study specific types of database system pro- 
cessing. For example, recent simulation studies have 
analyzed optimal granule size for database locking [RIBS 77, 
RIES 78] , centralized versus decentralized concurrency con- 
trol algorithms [GARC 78], and run-time schema interpreta- 
tion on a network database system [BARO 82] . 

Although simulation modelling can be useful in systems 
which are too complex for analytic modelling methods, there 
are some disadvantages [LAW 82] . The major concern is the 
time and expense that are often necessary to develop a simu- 
lation model. Stochastic simulation models also produce 
only estimates of a model^s "true" performance and the large 
volume of results returned by a simulation often creates a 
tendency to place more confidence in the results than may 
actually be warranted. As the simulation grows more com- 
plex, the difficulties in program verification increase 
correspondingly, making the validity of the results more 
difficult to determine. 



2.3 Benchmarking 

Benchmarking is used when a few database systems are to 
be evaluated and compared. Benchmarking requires that the 
systems be implemented so that experiments can be run under 
similar system environments. Benchmarks are costly and time 
consuming but provide the most valid performance results 
upon which database systems can be evaluated. In database 
benchmarking, a system configuration, a database, and a 
workload to be tested are identified and defined. Then 
tests are performed and results are measured and analyzed. 
The workload can be either representative of the planned ap- 
plication of the system (an application-specific benchmark) 
or designed to allow for an overall system evaluation (a 
general benchmark) . Running the workload on several systems 
or several configurations of the same system will supply in- 
formation which can be used to compare and evaluate the 
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separate systems or configurations. 



Although simulation and analytic modelling have been 
useful in nK^delling aspects of database system behavior , 
benchmarking can apply to the complete database system func-- 
tionality. While both simulation and analytic modelling are 
limited in the scope of their system testing, benchmarking 
offers the chance to evaluate the actual database system 
[GOFP 73]. Previous work on benchmarking database systems 
has been performed for two primary purposes. On a single 
database system, different implementation algorithms or dif- 
ferent system configurations can be tested. For multiple 
database systems, the performance of the different systems 
on the same database and workload can be compared [DEAR 78] . 

Early database benchmark experiments concentrated on 
the comparison of candidate commercial systems for a partic- 
ular application. For example, in [SPIT 77] an evaluation 
of three systems was described. The three systems tested 
were System 2000, GIM, and DMS-1100 . All three systems ran 
on the UNIVAC CS-1100 computer system and were evaluated 
utilizing a specially designed monitoring system; the MITRE 
Performance Evaluation System, PES-1100 . In [GLES 81], the 
benchmarking of several c arcial systems to find the best 
system for the U.S. Publiu ealth Service is described. One 
other early article [HILL 77] documented a performance 
analysis performed to select the best candidate system for 
nasal's Flight Planning System. 

In the academic environment several studies have been 
performed on single database systems to evaluate performance 
and test enhancements. The System R access path optimizer 
is studied in [ASTR 80] . Benchmarks on the INGRES database 
system were reported in [YOUS 79, KEEN 81]. In [HAWT 79] 
detailed benchmarks were used to identify some possible 
enhancements to improve the performance of INGRES. The 
results of this study showed that dramatically increased 
performance on INGRES could be achieved by implementing a 
combination of extended memory, improved paging, and multi- 
ple processors. A later article [STON 83] described the ef- 
fects of four enhancements to the INGRES system. The 
enhancements were dynamic compilation, microcoded routines, 
a special purpose file system, and a special purpose operat- 
ing system. The results showed that while all four enhance- 
ments Improved performance to some degree, the cost associ- 
ated with the improvements were significant. While the com- 
pilation and file system tactics produced a high return and 
were relatively easy to acconplish, the microcode and spe- 
cial operating systems resulted in somev^at less of an im- 
provement, and at a high cost. In [STON 82] and [EPST 80] 
the distributed INGRES database system was analyzed by 



studying the performance results of queries. 



More recently, comparisons of the performance of two or 
more database systems have been published. System Develop- 
ment Corporation has performed several comparison studies of 
benchmarks of database systems on a DEC VAX system [LUND 82, 
TEMP 82]. The first article compared ORACLE version 2.3.1 
to the IDM-500 release 17. A report by Signal Technology, 
Inc. (STI), [SIGN 82] showed a comparison of STi's OMNIBASE 
to INGRES version 1.2/09. This article focused on specific 
test results and did not attempt to make an overall com- 
parison of the two systems. 

Bitton, DeWitt, and Turbyfill have described a custom- 
ized database, a comprehensive set of queries and a metho- 
dology for systematically benchmarking relational databases 
[BITT 83] . In this study, testing was done on a synthetic 
database which was specifically designed for benchmarking. 
The benchmark included selections, projections, joins, ag- 
gregates, and updates on both the INGRES system, m two dif- 
ferent configurations, and the IDM-500 database machine. 
The INGRES database systems, the 'university and 
cial' versions, were implemented on a VAX 11/750. The IDM- 
500 database machine was connected to a PDP 11/70 host and 
was studied both with and without a database accelerator. 
The purpose of the study was to test and compare the four 




multiple user environment, by performing similar testing, 
with multiple users, on the IDM-500 database machine and 
ORACLE database system. 

In [BOGD 83] an experiment in bendhmarking a database 
machine was reported. The purpose of the paper was to 
present an approach to benchmarking database machines using 
a generated database. The paper describes a database gen- 
eration tool which allows the user to build a synthetic 
(generated) database through an interactive interface. The 
description of the testing was quite general. The system 
configuration of the database machine was not fully 
described. The testing in this research was limited to the 
single-user case. The paper provided a summary of the test- 
ing results and numerous graphs plotting the results. 

While benchmarking can be a useful and important tech- 
nique for database system evaluation; designing, setting up, 
and running a benchmark is a diffic-ult and time consuming 
task. In order to aid in the development and analysis of 
benchmarks it is essential that a generalized methodology be 
designed. While some work in this area has been done [TUEL 
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75, RODR 75, WALT 76, BITT 83, BOGD 83], no one methodology 
has provided the necessary robustness demanded of a general- 
ized methodology. Most of the methods presented have been 
either tied to a limited number of systems, or have not 
rigorously addressed the possible testing variables and 
design characteristics necessary for a generalized methodol- 
ogy. In order to apply to many types of evaluation (e.g., 
general vs. specific, single system vs. many systems), a 
methodology must discuss many possible design and implemen- 
tation features while providing guidance in the design of 
any benchmark experiment. In the next section an overview 
of the methodology is presented. 



3. A BENCHMARK METHODOLOGY FOR DATABASE SYSTEMS 



Managing a database requires a large, complex system 
made up of hardware, software, and data components. A 
benchmark methodology for database systems must consider a 
wide variety of system variables in order to fully evaluate 
performance. Each variable must be isolated as much as pos- 
sible to allow the effects of that variable, and only^that 
variable, to be evaluated. Because of the complex, interac- 
tive nature of database systems it is often very difficult, 
if not impossible, to do this. The benchmark methodology 
developed here enables a designer to identify the key vari- 
ables of a database system to be evaluated. In this section 
a synopsis of the methodology is presented. 

The benchmark methodology for database systems consists 
of three stages: 



1. Benchmark Design - Establishing the system environ- 
ment for the benchmark; involves designing the system 
configuration, test data, workload, and deciding on 
the fixed and free variables of the benchmark stu- 
dies. 

2. Benchmark Execution - Performing the benchmark and 
collecting the performance data. 

3' Benchmark Analysis - Analyzing the performance 
results on individual database systems and, if more 
than one system is benchmarked, comparing performance 
across several systems. 

Figure 3.1 illustrates the methodology as a flow chart. 
In this section, an overview of each phase is presented. 
The remainder of the report will discuss each phase in de- 
tail. 



3.1 Benchmark Design 

The design of a benchmark involves establishing the en- 
vironment of the database system to be tested, and develop- 
ing the actual tests to be performed. The four steps of the 
benchmark desiqo phase are described below. For a compara- 
tive benchmark "<5 >r several database systems, the benchmark 
design must be ut r-ariant over all systems. 
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3,1.1 System Configuration. 



The hardware and software parameters, such as main 
memory size, the number and speed of the disK drives, 
operating system support for database system requirements, 
and load scheduling policies will be determined in this 
step. Often the hardware and software configuration is 
given. This is usually the case when the database system is 
to be added to an existing computing system. Also, many da-- 
tabase systems can be installed on only one or very few 
types of operating systems. Cost is virtually always a fac- 
tor and, for many applications, will be the primary deter- 
minant of which system configuration is actually chosen. 

The parameters related to configuration that can be 
varied in the testing include maximum record length, the 
blocking factor of data in the storage system (e.g. the 
amount of data transferred by one disk access), the number 
of allowable indexes on relations, the maximum size and 
number of attribute values in an index, and the other types 
of direct access retrievals and their costs. 

3.1.2 Test Data. 

Among the parameters considered here are the test data ^ 
base, the background load, and the type and amount of index^ 
ing. The database on which the testing will be performed 
can be generated using one of two methods. The traditional 
method has been to use an already existing database, refor^ 
matting it for the benchmark needs. Recently, however, the 
approach of generating a synthetic database has been gaining 
popularity. Both techniques are discussed in Section 4. 

3.1.3 Benchmark Workload. 

A transaction load consists of several qualitative and 
quantitative aspects. Some of the qualitative aspects re- 
lating to transaction load are: the types of queries which 
occur (e.g., simple retrieval on a single relation or com- 
plex queries involving many files), the possible modes used 
for modification of the database (e.g., batch updates or in- 
teractive updates), the level of user-system interaction 
(e.g., on-line or batch access), and whether or not users 
commonly access the same data files. Some of the quantita- 
tive aspects of the transaction load include: the percentage 
of time that each type of database action is performed, the 
average amount of data returned to a user per transaction, 
the average number of users, and the number of users in- 
volved in each type of transaction. Therefore, the transac- 
tion load defines not only the number of users present in 



the system, but also the 
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introduced into the system by each user. 



,3.1.4 Experimental Design. 

In this important phase of the benchmark design, param- 
eters are selected to be varied in the benchmark testing. 
Values to be used for the parameters must also be defined. 
It is vory important to choose values that, while within 
reason for the system being tested, push the system to its 
performance limitations. Among the parameters to be con- 
sidered are database size, background load, number of 
indexes, query complexity, and number of simultaneous users. 

It is also in this phase of the benchmark design that 
the criteria to be used for evaluation are considered. It 
IS important to realize that the planned use of the system 
to be selected will have a definite relationship to the main 
measurement criteria on which the systems are evaluated. 
For example, if the system is expected to be used heavily 
and is likely to become CPU bound, system utilization or 
throughput would most likely be the main measurement cri- 
teria. On the other hand, if the system is more likely to 
be run under a light or moderate workload, response time 
would most likely be the roost important criteria. The 
selection of measurement criteria for the experimental 
design is discussed in greater detail in Section 4.4. 



3.2 Benchmark Execution 



After the time-consuming and complex task of designing 
the benchmark is completed, the next step is to execute the 
experiments. It would make benchmarking a much less compli- 
cated task if the benchmark could be implemented exactly as 
designed on each individual system to be tested. In reali- 
ty, this is seldom the case. Each system has its particular 
design and limitations to be considered. The benchmark has 
to be tailored to each specific system involved in the test- 
ing. Benchmark execution involves the steps of benchmark 
initialization, benchmark verification, and the actual 
benchmark tests. These steps are explained further in Sec- 
tion 5 . 
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3.3 Benchmark Analysis 



Benchmark experiments normally produce large amounts of 
output data that are too burdensome to evaluate. The final 
phase of a good benchmark experiment^ therefore^ must be a 
concise summary of the results obtained. This summary 
should point out the interesting results of the experiment 
and attempt to explain the reasons behind those results. A 
good summary will also present graphs relating testing 
parameters to response measures and matrices comparing 
results under varying variables. Benchmark analysis in- 
volves forming the raw performance data into graphs and 
tables that clearly illustrate observations and comparisons 
on the systems benchmarked. Benchmark analysis is fully 
described in Section 6. 

Two types of benchmark analysis can be performed based 
upon the objectives of the benchmark testing. 

1. Individual System Analysis - For each tested system , 
the data are analyzed to provide observations on the 
performance of the database system under varying sys- 
tem algorithms and conditions. 

2. Comparative System Analysis - When multiple systems 
are being studied, performance data can be compared. 
This analysis should provide a basis to make state- 
ments as to critical comparison among several data- 
base systems. 



4. BENCHMARK DESIGN 



The benchmark design is made up of three areas which 
provide input to the final step of experimental design. 
These three areas; system configuration, test data, and the 
benchmark workload; as well as other factors involved in the 
experimental design, are discussed in this section. 



4.1 System Configuration 

System configuration consists of a wide variety of 
parameters which relate to both hardware and software. The 
hardware parameters include main memory size, the number and 
speed of disk drives, and data blocking. The software 
parameters include the operating system support, scheduling 
policies, and query optimization. Below is a list of some 
of the parameters considered in this phase and a brief dis- 
cussion of each. A more detailed list of parameters can be 
found in [SU 81a, SU 81b] . 



4.1.1 Hardware Parameters . 



1. Main memory size consists of the number of bytes of 
memory available for programs, user work areas, and 
input/output buffers. 

2. Secondary storage consists of the number and type of 
disk drives. Parameters include disk capacity, seek 
time, and data transfer speed. 

3. The configuration and speed of the communication 
lines in the system are important features that ef- 
fect database system performance. 

4. The speed of the CPU has an effect on the response 
time since most database systems experience CPU sa- 
turation conditions. 
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4>1>2 Software Parameters 



!• Menrtory page size has a direct effect on the locality 
of data allocation. Large page size enhances the 
clustering effect but requires larger buffer space. 
Unfortunately most systems do not permit the user to 
specify the page size ao a parameter. 

2, Indexing directly affects the data retrieval effi- 
ciency. Index parameters should include the type of 
index supported and any restrictions on the number of 
indexes permitted. 

3, Operating system support and scheduling are often 
functions of the chosen operating system and are 
therefore difficult to test. 

4 • The query optimization scheme utilized by the 
software is not necessarily always the best method 
available. Comparison of an alternative method can 
often provide an interesting result. Although the 
query optimization algorithm is internal to a data- 
base system, some alternative algorithm^'s effect can 
be simulated through carefully arranged queries. 

5, Database system control algorithms, such as con- 
currency control and recovery algorithms, may also be 
tested as to their effect on performance. Some data- 
base systems allow differing levels of control by 
setting system parameters. New control algorithms 
can be tested by adding the programs to the database 
system. 

Many of the hardware and software parameters listed 
above are given, especially when the database system is to 
be added to an existing computer system. Often a database 
system can be installed on only one, or very few, types of 
operating systems. Therefore, testing is further con- 
strained in regard to the selection of configuration parame- 
ters. It is usually difficult to vary database parameters 
such as buffer size and page size. 



4,2 Test Data 

A database is represented on a logical level by a con- 
ceptual schema that provides an overall view of the data and 
data relationships in the system. At this level the data- 
base is defined in terms of relations, records, attributes, 
and domains (using relational terminology) • Hierarchical 
and network systems can be described in the appropriate data 
model terminology. At the physical level of representation 
the size and storage requirements of the database system 
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must be considered. In addition to the storage required to 
hold the data values in the database, access structures such 
as indexes must be included in the storage costs. Also, 
certain data may be duplicated for reliability or retrieval 
efficiency. 

4.2.1 Const ructtng the Database. 

One of the major considerations in any benchmark exper- 
iment is that of what test data will be used for the test- 
ing. The database used in the testing must be implemented 
on each of the candidate systems to be tested and after im- 
plementation must remain constant over all systems. There 
are basically two methods for obtaining a test database: us- 
ing an already existing application database or developing a 
synthetic database. 

A pplication Database 

The traditional method has been the use of real data 
from an application database. By ""real data'* is meant data 
that is being used, or has previously been used, for appli- 
cation purposes. If real data is to be used it must be for- 
matted into the appropriate form for each system to be test- 
ed. If several systems are to be tested, the data must be 
formatted for each of the systems, if the database systems 
involved in the testing are not all of the same type (e.g. 
relational, hierarchical, or network) , this formating can 
become a time consuming exercise in database design. Even 
when the systems involved are the same type, the loading and 
setting up of the database can produce unexpected problems. 
The use of real data, however, demonstrates database system 
performance on realistic application environments. This is 
clearly the best method when the evaluation is done to 
select a system for a known database environment. 

Synthetic Database 

The second method, the use of synthetic databases, has 
been gaining popularity in recent studies. When using this 
method, synthetic data is generated to make up a database 
which easily lends itself to benchmark testing. Attributes 
are allowed either integer or character values. Key attri- 
butes are assigned unique integer values. For example, for 
a relation with 10,000 tuples, the key attribute may take 
the values 0, 1, ... , 9999. The numbers can be scrambled 
using a random number generator. Other attributes are 
designed so that they contain non-unique values. The main 
purpose of these attributes is to provide a systematic way 
of modelling a wide range of selectivity factors. For exam*- 
pie, a relation could be designed containing an attribute 
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with a uniform distribution of the values between 0 and 99. 
By utilizing the random number generator to create 5000 oc- 
currences of this attribute into a relation, then queries 
can be easily designed with selectivit ies of 10%, 20%, , 
90%, or any other percentage that is of interest in testing. 
Since the attribute has only 100 distinct values in the 5000 
occurrences, 10% of the relation could be retrieved simply 
by running the following queries (using SQL): 

SELECT <all> 

FROM <relation> 

WHERE <attribute> < 10 

or 

SELECT <all> 

FROM < r e la t ion> 

WHERE <attribute> > 89 

Such a design allows for much greater control over selec- 
tivity and can lead to a more precise result. 

A major concern with the use of synthetic databases 
lies in the question of independence between attributes 
within a relation* In order to be certain that the attri- 
butes are truly independent each attribute within the rela- 
tion must have an occurrence for each and every value of 
every other attribute in the relation. For example, a rela- 
tion with two attributes (attr_l and at^r_2) containing 
values 0 through 1 and 0 through 2 respectivelv would have 
to contain the following records to demonstrate true attri- 
bute independence lULLM 82] . 
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Table 4.1: Independent Attributes 



attr i 


attr 2 


0 


0 


0 


1 


0 


2 


1 


0 


1 


1 


1 


2 



Obviously as the size of the relation grows, maintaining 
this independence leads to a very large relation. 

While the use of a synthetic database does add a cer- 
tain amount of control there are some drawbacks to using 
synthetic data. Synthetic relations fail to incorporate the 
complex data dependencies inherent to real data. For exam- 
ple, in a real database attributes within a relation, such 
as years of service and salary, have definite correlations 
which synthetic databases do not provide. Another factor to 
consider in choosing application vs. synthetic databases is 
the purpose of the testing. If the benchmark is being per-- 
formed in order to select a system for a specific applica- 
tion it would obviously be preferable to test some data that 
would be used in the application. The use of a synthetic 
database is most suitable when designing a general benchmark 
over several database systems. However, the use of synthet- 
ic data takes away a cerfuin measure of real world applica- 
bility present when using real data. 

4.2.2 Database Size. 

Database size is a key parameter and should be tested 
at various levels. Database sizes, "small" to "large", 
should be identified by studying the application system or 
the testing environment. In its use here "large" means the 
largest database the application system is likely to use or 
the largest database available for test ing. The term 
"small represents the point where the best performance is 
expected for the explication. These points will often be 
identified during the benchmark and should be estimated ini- 
tially. 

Benchmark testing should begin on the smallest test da- 
tabase. By stepping to larger sizes, performance changes 
can be readily noted. When large performance gaps are no- 
ticed between database sizes additional database sizes may 
be tested in order to diecwet the point where the database 
size causes performance deterioration. Although availabili- 
ty of data may often dictate how large a database is used in 
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the benchitiarKr it is important to set the upper size at an 
adequate level to assure that possible applications will not 
normally exceed that level. By identifying a higher level 
than would normally be attained the effects of possible fu- 
ture growth can be evaluated. 

The choice of the sizes of the databases to be tested 
will also be directly related to the system being tested and 
the resources available. If the testing is to be done on 
different configurations of one system, or comparing very 
similar systems, it is quite liKely that each system could 
be tested on all of the sizes of the database to be tested. 
If, on the other hand, the ti>sting is comparing aspects of 
systems of different sizer^ >\ith differing capabilities, the 
database sizes tested on one system may be limited to a sub- 
set of the sizes tested on a larger system (e.g., micros and 
minis) . 

4.2.3 Indexing. 

It is important to test the effects that indexing has 
on the performance of the systems being tested. Index lev- 
els should be set that will allow the systems to show 
differences in performance related to using the indexes. 
The transactions in the benchmark workload should be 
designed to highlight the potential performance benefits and 
costs of using indexes. 

Some of the possible index levels that could be select- 
ed include: 

0. l?o indexing . Studying results with no indexing pro- 
vides a basis for comparison of the change in perfor- 
mance when utilizing indexes and is therefore essen- 
tial. 

1. Clustered indexes on key attributes . A clustered in- 
dex is an index on an attribute v^ose data values are 
stored in sequential order. Clustered indexes on 
keys can be used effectively for retrieval and join 
operations. 

2. Non clustered ind exe s on secondary key attributes . 
Secondary key indexes^ if used properly, will enhance 
the performance of queries that contain selection 
conditions using these keys. 
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3* Complete indexing > Indexes are placed upon all at- 
tributes in the database* The benefits of coniplete 
indexing must be weighted against its costs* 

Database systems can also be tested for their ability 
to provide combined indexes . A combined index is one that 
ranges over two or more fields [STON 74] • Combined indexes* 
can be defined on attribute groups that appear frequently 
together in queries. 



4*3 Benchmark Workload 



The benchmark workload is the heart of the benchmark 
experiment. While the system configuration and test data 
define the running environment, the benchmark workload de- 
fines the tests that will be run on this environment. By 
choosing a variety of transactions and then modelling user 
and system workloads by utilizing a job scripts model, a 
variety of benchmark parameters can be tested. The job 
scripts model is defined in Section 4.3.3. Transaction 
ty|?es that should be considered in the testing are discussed 
in the next section. 



4.3.1 Transactions. 



Each system, and each user on that system, is involved 
in a variety of transactions. A transaction is defined here 
as a well-defined user action on the database. In testing 
a database system, a variety of transaction types should be 
run. I. benchmark should include the following types of 
transactions: 



1. Single - Relation Queries - These queries involve only 
one relation. Testing on single relation queries 
should include queries on different sizes of rela- 
tions, queries retrieving all or part of a relation, 
queries with and without indexes, and queries using 
aggregates and sorting functions such as "group-by" 
and "order-by". 

2. Multi - Relation Queries - These queries involve more 
than one relation. Testing should include all of the 
variables discussed in the single -relation queries. 
A benchmark should also include testing on different 
join methods and varying the number of relations that 
are included in the query. Joins must be tested with 
and without indexes, and with different sequences of 
joining the relations. 
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3. Updates - Updates include functions such as modifica*- 
tionr insertion, and deletion* Testing on updates 
must be performed carefully. '^hese queries change 
the state of the database. Tt the effects of the up- 
dates are not removed, further testing on the data- 
base will not be performed on exactly the same data. 
Theref:"*re the update effects must be removed from the 
database before further testing. 



4.3.2 User-System Environment. 

The next consideration in the design of the benchmark 
workload is the area of the user-system environment. This 
environment is a combination of factors including whether 
the system is on-line or batch, the frequency at which tran- 
sactions enter the system, utilities offered by the system, 
and the programming interface. 

There are basically two methods of executing transac- 
tions; batch and on-line. 



1. Batch - A batch transaction is submitted and run with 
no interaction between a user and his job during pro- 
cess ing. 

2. On- line - An on-line transaction allows the user to 
Interact with the program. 



The frequency at which transactions enter the system is 
another factor that should be considered in the workload en- 
vironment. Considerations regarding the amount of think 
time between transactions, the number and type of transac- 
tions, and the frequency of transactions on the system as 
the number of users grow, all must be taken into account. 
The job-scripts model defined in the next section is a con- 
venient method of modelling these factors. 

The utilities offered by the system are of prime con- 
cern to the database administrator and their functionality 
will be important to him. The utilities include creating^ 
loading, and extending a table, creating an index^ database 
dump, recovery, and checkpointing. User utilities may in- 
clude sort packages, statistical routines, and graphics 
packages that can be interfaced with the database system. 
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4,3,3 Job-Scripts Model, 



A job scripts model will be utilized to model the tran- 
saction load of the each user. By defining several distinct 
transaction loads and running them concurren tly, a multi- 
user environment can be modelled. 

A transaction load consists of several qualitative and 
quantitative aspects. Some of the qualitative aspects re- 
lating to transaction load are: the types of queries which 
occur (e,g,r simple retrieval on a single relation or com- 
plex queries involving many relations) , the possible modes 
used for modification of the database (e,g,r update through 
menu modification, batch update Sf updates in conjunction 
with queries) r the level of user-system interaction (e,g,r 
on-line vs. batch access) , and v^ether or not users commonly 
access the same data files. 

Some of the quantitative aspects of the transaction 
load include: the percentage of time that each type of data- 
base action is performed, the average amount of data re- 
turned to a user per transactionr the average number of 
userSf and the number of users involved in each type of 
transaction. Therefore, the transaction load defines not 
only the number of users present in the system, but also the 
quality and degree of activity introduced into the system by 
each user. 

Let a set of defined transactions be represented as T= 
{ t(l)r t(2)r ,,, r t(n) } v^ere t(i) represents the i-th 
transaction, 

A job script represents a typical workload of an appli- 
catiorr. user on the system. Let S be a set of pre-defined 
job scripts, s(i) r of the form: 

S(i) = < t(j(l)), X(l)r t(j(2))r X(2), ,,, r t(j(m)) > 

where t(j(lc)) is a transaction from the set T 

and x(i) stands for the average think time found between 
successive user transactions. The x(i) parameters can also 
represent the interarrival times of transactions into the 
system. Job scripts can be designed to characterize a par- 
ticular type of user in the system. For example , a database 
user may be described as' retrieval intensive or update in- 
tensive. In either case a job script can be designed for 
that user by selecting transactions that best represent the 
user^'s typical processing. 
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The job script model consists of two defined sets: the 
job scripts, S = { s{l)r s{2), .... s(p) ], and the set of 
users^ U = { u(l), u(2)r / u(r) }. The workload for the 

system is defined by the number of users on the system and 
the assignment of users to job scripts. Each benchmark 
study is parameterized by the mix of job scripts that the 
systems execute. The assignment of users to one or more job 
scripts provides a very effective and clear way to charac- 
terize this job mix. An additional advantage of this job 
scripts model is that it can be easily implemented by a test 
generator program in the actual benchmark study. 

A job script file for each user is read into a bench- 
mark runner program that executes on the host computer sys- 
temT The runner executes the transactions in job script 
order on the database system. This program also gathers 
performance data from the database system by recording 
statistics from hardware and software monitors. As an exam- 
ple r the following performance data can be collected on a 
database system for each transaction: 



1. Parse time - The time required to parse the transac- 
tion and send it for execution. 

2. Execution time - The time required to develop an ac- 
cess strategy for a transaction. 

3. Time to first record - Time until the first result 
record is presented to the user. 

4. Time to last record - Time until the last result 
record is presented to the user. 

5. Number of records retrieved - The result size of the 
transactToh; the size in bytes of the records should 
also be collected. 

The benchmark runner algorithm can be outlined as fol- 
lows : 

Algorithm Runner: 
Begin 

Read Job-script-file into trans-array until EOF; 

Open database system; 

While (trans-array not empty) do 
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Read next transaction from trans-array; 
Parse transaction and send to database system; 
Exec u te tr a nsac t io n ; 
Record time statistics on: 

time-to-parse 

t ime - to -e X ec u t e 

time -to-first 

time -to -last; 
Record size statistics on: 

number of records in result 

size of result records; 
Print gathered statistics; 

End while; 

Close database system; 

End of Algorithm, 

For each benchmark test a job script is defined and ex- 
ecuted on the different database systems to be tested. 
Statistics for each transaction in the job script are print- 
ed in a convenient format, For multi-user tests on the da- 
tabases, multiple copies of the benchmark runner are run 
simultaneously on separate job scripts. Statistics are 
gathered for each job script, 

4,3.4 Background Load. 



The accurate evaluation of any system must take into 
account the type and amount of non-database work being per- 
formed by the host computer system. Based on typical system 
usage, as defined by the application, a number of non- 
database programs should be designed for background execu- 
tion on the tested systems. These programs should be 
modelled using a job script in much the siame manner as the 
user transaction loads. In this way measurements can be ob- 
tained for the effects of the background load on the data- 
base load, while the effects of the database load on the 
background load can also be measured. By enlisting the job 
script approach, parameters on the background load such as 
arrival rate of the programs, type of programs in the back- 
ground load (e.g., CPU-bound vs, I/O bound) and priority 
given to programs in the background load, can be varied. 
Different background loads can be utilized in identifying 
system saturation points for the combined database and non- 
database system loads. 
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4.4 Experimental Design 



Now that the running environment and possible alterna- 
tives for testing parameters have been defined r the parame- 
ters to be used in the testing should be selected. In order 
to properly evaluate the testing it will be necessary to set 
fixed values for most parameters, while testing others at 
various levels. 

It is also at this stage of the benchmark design that 
the performance criteria to be used for evaluation are con- 
sidered. The evaluation criteria selected are an essential 
key to understanding and* correctly interpreting the bench- 
mark results. In this section the selection of the measure- 
ment criteria will be discussed, as well as a review of the 
possible experimental parameters that can be varied in the 
testing. 

4.4.1 Performance Measurement. 

The relevant measures that may be considered for use in 
the performance evaluation include system throughput, utili- 
zation, and response time. Each of these will be dis- 
cussed in the following paragraphs. 



1. System throughput is defined as the average number of 
transactions (e.g., queries) processed per unit time. 
It stands to reason that as the number of queries on 
the system increases, approaching a saturation level, 
the system throughput will also increase. The 
throughput is a good indicator of the capacity and 
productivity of a system. 

2. Utilization of a resource is the fraction of the time 
that the particular resource is busy. The main 
resources involved with processing database calls are 
the CPU, secbndary storage, and the communication 
channels between them. Utilization, as with system 
throughput, is directly related to the number of 
transactions on the system at one time. 

3. Responpe tim e can be taken to mean several different 
things. First, the response time could be considered 
as time-to-f irs;b - record . In other words, from the 
time the query enters the system until the time the 
first record in the response is returned. Another 
definition of response time could be time -to- last- 
record. This is from the time the query enters the 
system until the last record of the response is 
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available. Using this measurement would^ of courser 
cause the size of the result to influence the perfor- 
mance measure. If the system is I/O bound the time 
to retrieve and deliver the entire response could 
eclipse the actual transaction processing time. 



System throughput, utilization, and response time are 
all related in some sense. All three measurements tend to 
increase as the load on the system increases, but while a 
high system throughput, and a high utilization rate are per- 
ceived as desirable, a large response time carries a nega- 
tive connotation. 

Determining resource utilization and /or system 
throughput can be a very easy, or very difficult task 
depending upon the support offered by the system being test- 
ed. Some systems provide tools which offer easily accessi- 
ble statistics while others require a great deal of user in- 
tervention using software tools to acquire the necessary in- 
formation. Response time is usually the most readily avail- 
able measure and is also the most user apparent. Because of 
these facts response time is the measurement utilized in 
most benchmarks. 

The method used to perform the necessary calculations 
in determining response time is often a function of the sup- 
port offered by the system being tested. For example, a da- 
tabase system running on an operating system which allows a 
flexible interface could support a very detailed timing al- 
gorithm, while a system with limited interfacing ability may 
require the use of a much more general timing method (e.g., 
setting a time at query input and again at query completion 
and recording the difference) . 

4.4 4 2 Experimental Variables. 

A number of important tests can be performed in the 
benchmark by selecting and varying one or more dependent 
variables. The possible parameters that could be selected 
include: 



1. Database Size - Several sizes relevant to the system 
being tested should be selected. 

2. Query Complexity - Two factors are considered in 
determining query complexity. Greater complexity of 
the query predicate leads to increased parsing time 
and increases the potential for query optimization. 
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within each query set, the predicate complexity is 
increased by adding additional conditions. A method 
of complexity classification has been developed by 
Cardenas [CARD 73] . The complexity of a query predi- 
cate ^icreases in the following manner: 

a. An atomic condition simply places a simple selec- 
tion on a relation (e.g., Relationl .Attl = '10'). 

b. An item condition is a disjunction (OR) of two 
atomic conditions on the same attribute (e.g., 
Relationl .Attl = '10' OR Relationl .Attl = '20 ). 

c. A record condition is a conjunction (AND) of two 
item conditToni TeTg, , Relationl .Attl = '10' AND 
Relationl.Att2 = 'ABC'). 

d. A query condition is a disjunction (OR) of record 
conditions (e.g., Relationl .Attl = '10' OR 
Relationl.Att2 = ^ABC'). 

Second, the number of relations involved in a query 
indicates query complexity. More costly join opera- 
tions are required v^en multiple relations are in- 
volved. 

Records Retrieved - The response time of a transac- 
tion will depend greatly upon the number of records 
in the query result. 

Order of Query Execution - The different database 
systems use internal memory as buffers for the 
storage of needed indexes and intermediate results 
during query execution. To test the effect of the 
buffer memory on the order of query execution, job 
scripts should be formed which consist of similar 
transaction loads executed sequentially. This will 
identify any efficiencies caused by buffering. 

Indexing - Indexing should be tested at various lev- 
els. The use of at least three levels of indexes is 
recommended: no indexes, primary key indexes, and 
complete indexes. 

Sorting - Sorting costs should be tested. One method 
of doing this is to add 'order by' clauses to the 
query sets. By comparing sorted and unsorted 
queries, the costs of sorting in the different data- 
base systems can be determined. 
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7. Aggregation Functions - Aggregation functions should 
be tested by adding 'count' or 'max' to the output 
list. 

8. Number of Users - Multiple users contend for database 
system resources. This tends to increase the 
response time and increase the throughput. To study 
contention, tests should be run in which each user 
runs an identical job script. Other tests would in- 
clude different combinations of job scripts. Multi- 
ple user tests will also test the database system's 
capabilities to control concurrency. Concurrent up- 
dates on the same data item will test the locking 
protocols of the systems. 

9. Background Load - Tests should include runs varying 
the non-database jobs in the host computer system. 
The number and type of jobs in the background can be 
varied. Background jobs can be designed as CPU or 
I/O intensive jobs. Tests can determine the effect 
of these jobs on the performance of the database 
queries. By measuring the performance of the back- 
ground jobs under different query loads, the effect 
of database jobs on the background jobs can also be 
studied. This is called a reverse benchmark . 

10. Robustness - System performance should also be meas- 
ured under controlled failure conditions in the sys- 
tem. This includes simulating the conditions of loss 
of power, disk failure, and software bugs. The capa- 
bility to recover from these failures gracefully is 
an important feature of any database system. This 
includes the system's ability to recover on-going 
transactions, and to back out such transactions and 
request resubmission. Possible tests here include a 
deadlock test (it may or may not be easy to induce 
transaction deadlock) and disaster tests including a 
failed disk (demonstrate by powering down the disk), 
a failed system (power down the entire system), and 
an aborted transaction. 



A benchmark may be either application- specific, mean- 
ing that it is intended primarily to evaluate the suitabili- 
ty of the candidate database systems for a particular appli- 
cation; or it can be more general, in which case the experi- 
ment is intended to perform an overall evaluation. When 
selecting parameters such as the types of transactions, 
number of users, and background load, the type of benchmark 
that is being performed will have a direct relation on the 
parameters selected. For example, when selecting 
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transactions to be run, if the benchmark ir. app J icnt ion- 
specific, the application environment would bo studied and 
transactions modelled to duplicate this environment. In the 
more general case, a wide variety of transactions v; M\■\(^ be 
used with the intention of including the spectrum of actions 
typically performed in a user environment. 



5. BENCHMARK EXECUTION 



When the experiment has been formally defined the next 
step is to implement the design on each of the candidate 
sy steins. The actual execution of the benchmark can be bro- 
ken into three phases: benchmark initialization, benchmark 
verification, and benchmark testing. 



5.1 Benchmark Initialization 

Before any testing can be performed the benchmark must 
first be initialized. During initialization the database is 
prepared for testing and the benchmark runner program is 
readied for use. 



5. 1. 1 Loading . 

The first step in preparing for the benchmark is to 
load the database into each t)f the test systems. While 
loading may seem like a simple step, this is often not the 
case. In most benchmark experiments the database is 
transferred from one system to each of the systems to be 
tested. When this is the case caution must be used to avoid 
inconsistencies in the databases. Any transfer of data 
between two systems requires the use of transfer protocols. 
When transferring a small amount of data the protocol 
between the systems may handle the details quite well. Test 
databases for benchmarking are usually quite large and 
transferring can be a very lengthy task. While not major 
factors, power failures, surges, and hardware malfunctions 
can create inconsistencies in the transfer and result in in- 
correct data. 

If application data is used, the data must be reformat- 
ted into a usable form for the test system. The re for ma ting 
of data can sometimes lead to unexpected problems which 
result in either the loading of unusable data, or not being 
able to load the data. If the data loaded into the test 
system turns out to be incorrect it may have to be dumped 
and the loading reinitialized with the necessary correc- 
tions. When loading large databases this can be a costly 
and time-consuming process. 

Using a synthetic database does not eliminate the load- 
ing problems. If the database is generated in one system 
and subsequently transferred to the test systems, the same 
concerns as above apply. If, on the other hand, the data- 
base is generated on the test system, the consistency of the 
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test data must still be verified. 



5.1.2 Tinting. 

It is also in this initialization phase that the timing 
algorithm in the benchmark runner is chosen, designed, and 
tested. In section 4 .3 a detailed discussion of the possi- 
ble measurement criteria to be returned by the timing algo- 
rithm is presented. In this phase the mechanism to be used 
is considered and tested. In the analysis of a single sys- 
tem, or compatible systems, the timing algorithm chosen 
should provide as detailed results as possible. Whether 
hardware monitors or software monitors are to be used, the 
initialization phase should allow for the tuning and testing 
of the tool. 

V?hen benchmarking several systems, the measurement cri- 
teria chosen must be implemented across all systems. The 
lack of ability to support a detailed timing algorithm by 
one system will sometimes limit the criteria that can be 
evaluated in comparisons. To insure that the timing method 
chosen is implemen table across all test systems, the moni- 
toring should be tested prior to any actual testing. 



5.2 Benchmark Verification 

Results obtained from benchmark experiments must be 
verified in order to be of any value. Three types of verif- 
ication are discussed below. 



Equivalence Verification. Each transaction is coded 
in the data manipulation language (e.g., SQIi) of the 
different systems to be tested. It must be verified 
that the transactions are equivalent across all of 
the different systems to be tested. This equivalence 
can be tested by executing the transaction and check- 
ing that the results are correct and identical on all 
system. A more rigorous approach would be to prove, 
via semantic proving techniques, that the transac- 
tions are equivalent and will always produce identi- 
cal results. 

2* Optimum Performance Verification . In order to 
achieve fairness in the benchmarking, it should be 
verified that the transactions are coded so that the 
best performance can be realized in a typical confi- 
guration of each system. The existence and use of 
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access paths (eg., index structures) should be as 
close to identiCcN* as possible in the different sys- 
tems. The Si ?v vf:i*^ will be set up with nornial ainounts 
of processing po . . r and storage as would be found in 
a typical configuration. In this way, each system 
will be shown in its ^best light^. A ^nethoa of ac- 
complishing this performance verification is to exe- 
cute the transactions (after equivalence verifica- 
tion) on each system and collect performance results. 
Each vendor should be asked to evaluate the perfor- 
mance of his system and to suggest ways to tune the 
system for better performance. 

3. Consis t ency Verificatio n. During the execution of 
the be ichmark experiments, a method for checking the 
consistency of performcince results should be included 
in the experimental dersign. A consistency check con- 
sists of running a particular benchmark more than 
once and verifying f:hat the performance results are 
consistent between th« runs. While not all benchmark 
runs need to be duplicated, performing selected con- 
sistency checks will provide some assurance that the 
performance results are a function of the defined 
system environment. 



The implementation of each job script must be verified 
both for correctness of its semantics and optimality of per- 
formance. When multiple database systems are being tested, 
the scope for verification measures is increased. In order 
to verify that the proper transactions are being used, the 
results of the queries (e.g., record counts, text of the re- 
turned fields} can be compared across database systems. Any 
discrepancies obt^erved will trigger an inspection of the of- 
fending scripts. Common errors are: a missing qualification 
sub-clause, an improper sort order, or a syntactically 
correct but semantically incorrect operator in a selection 
clause. 

Scripts may be changed during the initial check --out 
phase. A single benchmark experiment will run one or more 
scripts simultaneoiiSly on a target database system. Each 
script will be monitored so that the system response time 
which it experiences can be recorded. Simultaneously, 
statistics describing overall system throughput will be 
recorded. The choice of scripts to run together will be 
determined based upon the behavior in the database systems 
that are being tested. This is one area in which it is ex- 
pected that preliminary evaluation of the benchmark results 
will feed back into the experiment, as new groups of scripts 
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will be suggested and tried out to probe specific perfor- 
mance features* 

The details involving the transaction verif ication, as 
well as those involved with running the benchmark r will have 
a direct relation on how well the results from each system 
can be analyzed and compared to one another* 



5.3 Benchmark Testing 

Once the validity of the database has been assured, it 
is time to begin planning the benchmark runs on each of the 
systems to be tested. Differences between systems will 
necessitate differences in the approach to each implementa- 
tion. For example, the limitations of a particular system^s 
architecture may allow only a subset of the experiment to be 
applied. So, while the methodology flowchart presented ear- 
lier shows a clean interface between the design and execu- 
tion phases, in reality this is rarely, if ever, the case 
when implementing the benchmark on more than one system. It 
will often be necessary to return to the design phase in 
o: ^^r to restrict or revise the planned testing on each par- 
t alar system to be tested. 

Any general design of a benchmark will encounter prob- 
lems specific to a given environment. The variations of the 
hardware and operating system environments, as well as the 
particular database system, will cause the experiment to 
vary from its originax design. In an application-oriented 
benchmark any system which lacks the functionality to per- 
form all of the desired operations presumably has already 
been eliminated during the features analysis phase. Howev- 
er, a general benchmark may include a diverse set of sys- 
tems, some of which cannot perform all of the tests, or may 
perform limited versions of them. 

The hardware involved can cause departures from what is 
desired. If, for instance, there is a limited amount of 
main storage available, the operating system and the data- 
base system taken together may preclude the testing of any 
type of background load altogether. Also, limited amounts 
of main storage reduce the possible complexity of the data- 
base system. Interaction between the various hardware/ 
operating system/ database system components can have 
deleterious effects. For example, the conflicting seeks 
used by an operating system and the DBMS system software are 
shown to have degraded performance of benchmark tests run by 
System Development Corporation [LUND 82] . 



In the case of a small computer database systemr the 
limited amount of main storage may not allow the database 
system enough space to have the code to implement all of the 
functions specified in the scripts. Aggregate functions 
(e.g.r MAXr MINr AVERAGE) are not present in the query 
languages of many database systems. More exotic (but use- 
ful) operations such as an ""outer join"* are present in very 
few of today ''s systems. 

The benchmark tests must be carefully monitored during 
their execution^ and as knowledge is gained from the experi- 
mentSr it is expected that the original experiment will be 
redesigned to take advantage of that knowledge. Hence r run- 
ning the benchmarks is not a completely ''cut-and-dried'' 
task. Most of the benchmark involves running scripts 
against each of the target systems^ while varying individual 
parameters of the systems. After each runr the results will 
be scanned to verify that embedded consistency checks have 
not uncovered any anomalies. 

Once the experiments are running smoothly ^ the effort 
of interpreting the data will beg in r and henceforth^ the 
processes of gathering data and interpreting it can proceed 
in parallel. The results obtained should suggest new combi- 
nations of parameters and scripts^ or variations on old 
oneS/ which will be run to probe specific aspects of a 
system'^s performance. 
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6, BENCHMARK ANALYSIS 



The final phase of benchmarking is the analysis of 
results. Benchmark testing often produces large amounts of 
raw data which must be condensed and analyzed. Evaluation 
of the data generated during benchmarking must begin before 
the tests have been completed. This provides feedback dur- 
ing the testing by suggesting which types of experiments 
need to be repeated in more detail, or should be extended in 
some way. Summarizing the meaningful information from these 
results and discussing them in a report form is a key step 
in the benchmark testing. Explanations are provided for any 
significant findings and graphs and tables showing the com- 
parison of results are included. Analyses are made on both 
individual systems, comparing results by varying parameters, 
and between systems, comparing oner's results to the other ^s 
results. The following section discusses some of the possi- 
ble comparisons to be drawn when analyzing a single system 
or when comparing several systems* 

Each test parameter should be evaluated in as isolated 
an environment as possible so that the results can be 
directly attributable to the current configuration of param- 
eters. A matrix of parameters to be evaluated (e.g., data- 
base size, background load, etc.) should be designed and 
performance benchmarks run for each combination. 

When the testing is complete, results of each system 
are thoroughly analyzed. The parameters defined as those to 
be varied in the testing are to be monitored, and their 
behavior summarized. Graphs are utilized to demonstrate 
these behaviors. Graphs provide a clear, concise synopsis 
of the relationship between parameters and should be util- 
ized frequently in the analysis phase. 

When evaluating a single system, a variety of comparis- 
ons should be studied in order to identify interesting 
resalts. Below are several possible comparisons and effects 
that should be included in the analysis. 



1. Response Time vs . Query Complexity > Under normal 
conditions, the relationship between these two param- 
eters should be an increase in response time as the 
query becomes more complex due to an increase in 
parsing and execution. This is shown when consider- 
ing the time-to-first-record response time statistic. 
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2. Response Time vs. Records Retrieved. This relation- 
ship should provide an interesting result regarding 
the time relationship when retrieving increasing 
numbers of records. 

3. Response Time vs. Indexing , indexing should have a 
positive effect (decreasing response time) for most 
queries but this is not always the case. Some index- 
ing may actually cause the response time to increase 
if the query accesses a large percentage of the rela- 
tion (high hit ratio ) . 

4. Buffering. Efficient use of buffer space can some- 
t lines lead to improved performance. However, this re- 
quires the use. of special buffer management algo- 
rithms which are not implemented in most database 
systems. 

5. Sorting. Sorting can be an expensive process. A 
test showing the difference in a query when it is 
sorted vs. when no sorting is required can identify 
the system's strength in this area. 

6. Aggregation. The resulting increase in cost as the 
number of records retrieved increases using an aggre- 
gate function should be documented. 

' Multi-user Results. The effects of multi-users on 
the database are an important and realistic testing 
parameter. The amount of increase in response time 
as the number of users on the system increases should 
be calculated and graphed. 

8. Background Load Results. In much the same manner as 
the multi-user environment, the background load can 
have a dramatic effect on the response time. As the 
background load increases the resulting increase in 
response time on the database system workload should 
be monitored. 

9. Reverse Be nchmark . The database workload will have 
an effect on the performance of applications on host 
computer systems. An analysis of this effect should 
be performed. 



Performance saturation points will occur when a system 
shows a marked decrease in performance based upon a resource 
becoming overloaded. System saturation points should be 
identified and plotted for each of the systems tested. The 
saturation level will be a function of the number of users. 



types of workload, background load, and other tested parame- 
ters. Therefore, an explanation of the saturation points on 
each configuration and some comments regarding the level are 
necessary to backup the graphs. 

Finally, the end product of any benchmark experiment 
should be a report which summarizes the interesting findings 
of the testing, discusses the reasons behind the findings, 
and draws comparisons between the systems tested. If any 
possible solutions exist to problems identified in the 
benchmark they should be recommended in writing. This re- 
port should stress general, rather than specific, results 
and therefore provide an overall evaluation of the systems. 



7. SUMMARY AND CONCLUSIONS 



This report has presented a general methodology for the 
benchmarking of database systems. Previous projects on da- 
tabase system benchmarks, surveyed in Section 2, have iden- 
tified many different factors that influence database per- 
formance. The objective in this report has been to describe 
a framewrk into which these many database system parameters 
can be fitted. Three principal phases of a database system 
benchmark have been identified. 



1. Benchmaik des ign includes the design of the system 
configuration, the test data, and the benchmark work- 
load. Thfsse parameters are controlled in the experi- 
mental design of the benchmark. Performance measures 
and the means to gather the performance statistics 
are selected. 

2. Benchmark execution implements the design on one or 
more database systems. This phase requires strict 
verification procedures and may involve feedback for 
improving the benchmark design. 

3. Benchmark analysis is the phase in which the raw per- 
formance data is studied and observations and conclu- 
sions are made. Single system analysis and multiple 
system comparisons form the result of the benchmark. 



The design of a benchmark methodology is a complex 
task. Previous work on database system benchmarks has been 
applied to selected cases of interest, in this methodology 
an attempt has been made to present a framework in an order- 
ly, top-down fashion to assist the designer of a benchmark 
experiment in the design and implementation of a benchmark. 

In attempting to design a generalized, standard ap- 
proach to benchmarking, the complexity of the actual task of 
designing a specific benchmark must be taken into account. 
No generalized methodology car? provide a complete list of 
considerations for the design of an actual experiment. In- 
stead, the methodology can only provide the user with as 
comprehensive a list of system parameters as possible. Each 
experiment and each system has its own characteristics and 
constraints. While the methodology will help the designer 
by providing a comprehensive framework for the benchmark, it 
is the designer's task to fit the particular aspects of each 
database system, application environment, and operating 
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constraints into a viable benchmark study. 
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